首页> 外文OA文献 >On developing robust models for favourability analysis : Model choice, feature sets and imbalanced data
【2h】

On developing robust models for favourability analysis : Model choice, feature sets and imbalanced data

机译:关于开发可靠性分析的稳健模型:                     模型选择,功能集和不平衡数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Locating documents carrying positive or negative favourability is an important application within media analysis. This article presents some empirical results on the challenges facing a machine-learning approach to this kind of opinion mining. Some of the challenges include the often considerable imbalance in the distribution of positive and negative samples, changes in the documents over time, and effective training and evaluation procedures for the models. This article presents results on three data sets generated by a media-analysis company, classifying documents in two ways: detecting the presence of favourability, and assessing negative vs. positive favourability. We describe our experiments in developing a machine-learning approach to automate the classification process. We explore the effect of using five different types of features, the robustness of the models when tested on data taken from a later time period, and the effect of balancing the input data by undersampling. We find varying choices for the optimum classifier, feature set and training strategy depending on the task and data set.
机译:查找具有正面或负面偏爱的文档是媒体分析中的重要应用。本文针对这种观点挖掘的机器学习方法所面临的挑战提供了一些实证结果。一些挑战包括正负样品的分配经常不平衡,文档随时间变化以及模型的有效培训和评估程序。本文介绍了一家媒体分析公司生成的三个数据集的结果,这些文档以两种方式对文档进行分类:检测是否存在有利性,以及评估负面与正面有利性。我们在开发一种机器学习方法以自动化分类过程的过程中描述了我们的实验。我们探讨了使用五种不同类型的功能的效果,在较晚时间段对数据进行测试时模型的健壮性以及通过欠采样来平衡输入数据的效果。我们根据任务和数据集找到最佳分类器,功能集和训练策略的不同选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号